Multimodal Interaction for Scene Boundary Detection
نویسندگان
چکیده
A scene boundary detection method is presented, which analyzes both aural and visual information sources and accounts for their inter-relations and coincidence to semantically identify video scenes. Audio analysis focuses on the segmentation of the audio source into three types of semantic primitives, i.e. silence, speech and music. Further processing on speech segments aims at locating speaker changing points. Video analysis attempts to segment the video source into shots. Results from single source segmentation are in some cases suboptimal. Audio-visual interaction achieves to either enhance single source findings or extract high level semantic information. The aim of this paper is to identify semantically meaningful video scenes by exploiting the temporal correlations of both sources based on the observation that semantic changes are characterized by significant changes in both information sources. Experimentation has been carried on several TV sequences composed of many different in-content scenes with plenty of commercials in-between.
منابع مشابه
Multimodal Semantic Analysis and Annotation for Basketball Video
This paper presents a new multiple-modality method for extracting semantic information from basketball video. The visual, motion, and audio information are extracted from video to first generate some low-level video segmentation and classification. Domain knowledge is further exploited for detecting interesting events in the basketball video. For video, both visual and motion prediction informa...
متن کاملVideo Scene Detection by Multimodal Bag of Features
Recent advances in technology have increased the availability of video data, creating a strong demand for efficient systems to manage this kind of content. To make efficient use of video information, first, the data have to be automatically segmented into smaller, manageable and understandable units, like scenes. This article presents a new multimodal video scene segmentation technique. The pro...
متن کاملBroadcast News Story Boundary Detection Using Visual, Audio and Text Features
News video story segmentation is vital for video summarization, story linking, and curation. We present a multimodal segmentation algorithm which fuses video, audio and text cues for story boundary detection. We show that broadcast news closed captioning is a rich and readily available source that improves story boundary detection. Furthermore, we propose an empirical distribution-based feature...
متن کاملAugmented World: Real Time Gesture Based Image Processing Tool with Intel RealSenseTM Technology
Intel RealSenseTM is an exciting new technology that offers innovative multimodal human computer interaction with hand tracking, face tracking, emotion detection, speech synthesis and voice recognition. The technology supports real time background separation by combining the depth and RGB streams of RealSenseTM camera. Several past works have provided robust techniques for background separation...
متن کاملCompressed Domain Scene Change Detection Based on Transform Units Distribution in High Efficiency Video Coding Standard
Scene change detection plays an important role in a number of video applications, including video indexing, searching, browsing, semantic features extraction, and, in general, pre-processing and post-processing operations. Several scene change detection methods have been proposed in different coding standards. Most of them use fixed thresholds for the similarity metrics to determine if there wa...
متن کامل